MilTech 2007 USING TEXT CLUSTERING FOR INTELLIGENCE CLASSIFICATION

نویسندگان

  • Tomas Berg
  • Christian Mårtenson
  • Pontus Svenson
چکیده

In this paper, we discuss how text mining methods could be used for intelligence analysis. We describe how simple methods from text mining can be used to help intelligence analysts determine where a specific report or analysis fits into the knowledge base (KB), i.e., how it should be classified and which, if any, other documents in the KB it should be linked to. The method works by comparing the vector space model representation of the new information document with those of all documents previously stored in the knowledge base. Those documents that are sufficiently similar to the new piece of information are displayed to the user, who can then choose to place links between them. Using a computer tool such as the one suggested here allows the analyst to spend more time analyzing intelligence reports rather than searching for and classifying them. In previous work, we have discussed how the MilWiki, an improved implementation of the open-source MediaWiki system, could be used as a knowledge base for military purposes. To illustrate the text classification method described in this paper, it has been implemented for MilWiki. To simulate new pieces of information, the prototype allows the user to download articles from the Wikipedia. The method, as well as the collaborative work process used in a wiki, could be implemented in any content management systems. In addition to describing the text classification method, we also give a brief introduction to text mining and the vector space model of documents.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

Using Clustering for Web Information Extraction

This paper introduces an approach that achieves automated data extraction for semi-structured Web pages by using clustering to group text tokens and data tuples into clusters. This approach uses both HTML and text features of text tokens to detect the similarities between them. After clustering, similar text tokens are expected to be in the same text clusters and labeled with the same text clus...

متن کامل

به کارگیری روش‌های خوشه‌بندی در ریزآرایه DNA

Background: Microarray DNA technology has paved the way for investigators to expressed thousands of genes in a short time. Analysis of this big amount of raw data includes normalization, clustering and classification. The present study surveys the application of clustering technique in microarray DNA analysis. Materials and methods: We analyzed data of Van’t Veer et al study dealing with BRCA1...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007